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Abstract —Evolutionary algorithms (EAs) are very popular 
tools to design and evolve artificial neural networks (ANNs), 
especially to train them. These methods have advantages over the 
conventional backpropagation (BP) method because of their low 
computational requirement when searching in a large solution 
space. In this paper, we employ Chemical Reaction Optimization 
(CRO), a newly developed global optimization method, to replace 
BP in training neural networks. CRO is a population-based 
metaheuristics mimicking the transition of molecules and their 
interactions in a chemical reaction. Simulation results show that 
CRO outperforms many EA strategies commonly used to train 
neural networks. 

Index Terms —Artificial neural networks, evolutionary algo¬ 
rithm, chemical reaction optimization. 

I. Introduction 

A rtificial neural networks (ANNs) are complex networks 
imitating the way human cerebral neurons process in¬ 
formation to realize parallel information transformation and 
processing. ANNs have been widely employed to solve real 
life problems related to classification, function approximation, 
data processing, and robotics. The training algorithm used to 
determine various parameters of ANN is one of the key factors 
that influence the performance of ANNs. Among all training 
algorithms, backpropagation (BP) has been widely used, but 
it suffers from the problem that it is easy to get stuck in 
the local optima, and its low convergence speed dll. With 
the advancement of evolutionary algorithms (EAs), they are 
employed to train ANNs. Moreover, an EA can simultaneously 
optimize the weights of an ANN and it can also evolve the 
structure of the network so as to achieve desirable performance 
0 . 

With different levels of EA involvement, EA-based ANNs 
can be classified into two major types: “noninvasive” and 
“invasive”. The former refers to those methods using EA to 
evolve network structure in conjunction with BP for weight 
adaptation. The latter includes those using EA for both 
evolving network structure and weight adaptation 0. Due to 
the tradition of employing BP for network weight training, 
“noninvasive” methods have been widely studied and many 
algorithms have been developed with outstanding performance 
00- Since they rely heavily on BP, they still suffer from the 


problems of getting stuck in local optima and low convergence 
speed 0. 

The “invasive” methods, however, merely depend on EA for 
evolving the network. Thus the computation speed is higher 
than “noninvasive” methods since the “invasive” methods 
can avoid BP fitness evaluation with direct representation of 
networks. In this paper, we propose a new algorithm based 
on Chemical Reaction Optimization (CRO) 0 to evolve the 
network structure and to tune the weights of networks. 

CRO is a novel chemical reaction-inspired general purpose 
optimization algorithm. It is a variable population-based meta¬ 
heuristics, mimicking the transition of molecules and inter- 
molecular interactions in a chemical reaction. The transitions 
and interactions tend to direct molecules toward the lowest 
potential energy states on the potential energy surface (PES). 
Thus CRO uses the idea of mimicking the objective function 
landscape with PES and molecules can explore the solution 
space to find the global optimum due to this tendency. CRO 
has been proved to be effective in solving many practical 
problems 001101 and simulation results show that ANNs 
trained by CRO outperform other EAs in many classification 
problems. 

The rest of the paper is organized as follows. In Section 
II, we briefly present the related work of using EA to train 
ANNs. In Section III, the problem formulation is presented. 
The detailed framework and the algorithm based on CRO is 
given in Section IV. Section V presents the simulation results 
comparing CRO-based ANNs (CROANN) with other ANNs. 
Finally we conclude the paper in Section VI. 

II. Related Work 

Using EA to train ANNs has become an active research 
topic. Many EAs, e.g. genetic algorithm (GA) he simulated 
annealing (SA) 021, and particle swarm optimization (PSO) 
os have been used. Yet relatively few “invasive” methods 
have been studied to achieve the best performance of EA- 
based neural networks. Sexton et al. used Tabu Search (TS) 
for neural network training Iff4l . where TS was used to train 
a fixed neural network with six hidden layer neurons. The 
TS solution is given in the form of vectors representing 
all the weights of the network. The testing data set was 
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a collection of randomly generated two-dimensional points 
(x,y) where x £ [—100,100] and y £ [—10,10]. The output 
data set was generated by simple mathematical functions. The 
result demonstrated that TS-based networks could outperform 
conventional BP-derived networks. SA and GA were also 
implemented for the same data set EG). 

Angeline et al. proposed GeNeralized Acquisition of Re¬ 
current Links (GNARL) using hybrid-GA to train ANNs 
ED. Instead of using symmetric topology, GNARL employs 
sparse connections of neural networks to represent the net¬ 
work structures. GNARL uses a mutation operation to evolve 
the structure and to tune the weights of networks. GNARL 
reserves the top 50% individuals in each generation, according 
to the user-defined fitness function, and performs reproduction 
by two types of mutation methods: parametric mutation and 
structural mutation. The former mutation method changes 
the network by perturbing the weights with Gaussian noise 
controlled by an annealing temperature 116) . while the latter 
mutation method involves the addition or deletion of hidden 
layer nodes or links. 

A Constructive algorithm for training Cooperative Neural 
Network Ensembles (CNNE), proposed by M. Islam et al. 
ED, uses a constructive algorithm to evolve neural networks. 
CNNE relies on the contribution of individuals in the popu¬ 
lation and uses incremental learning to maintain the diversity 
among individuals in an ensemble. Incremental learning based 
on negative correlation could effectively reduce the redun¬ 
dancy generated by individuals searching the same solution 
space and thus different individuals could learn different aspect 
of the training data, which could result in a final solution of 
the ensemble. CNNE is a “noninvasive” method which relies 
on proper implementation of BP. Though CNNE minimizes 
optimization problems by utilization of ensembles, it suffers 
from the “structural climbing problem” EG). 

S. He et al. proposed a Group Search Optimizer-based 
ANN (GSOANN) ED . which uses Group Search Optimizer, 
a population-based optimization algorithm inspired by animal 
social foraging behavior, to train the networks with least- 
squared error function as the fitness function. 

Paulito P. Palmes et al. proposed mutation-based genetic 
neural network (MGNN) employing a specially designed mu¬ 
tation strategy to perturb the chromosomes representing neural 
networks & MGNN is very similar to GNARL except that 
it implements selection, encoding, mutation, fitness function, 
and stopping criteria differently. MGNN’s encoding scheme 
contributes to a flexible formulation of fitness function and 
mutation strategy of local adaptation of evolutionary program¬ 
ming, and it implements a stopping criteria using “sliding 
window” to track the state of overfitness. 

III. Problem Formulation 

In this paper, we consider the problem of single-hidden- 
layer feedforward neural network (SLFhQ) design for data 

1 SLFN is the original abbreviation used in fTJ to refer to a “single-hidden- 
layer feedforward neural network”. 
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Fig. 1. Single-hidden-layer Feedforward Neural Network 


classification. We use a topological structure, activation func¬ 
tions of nodes, and connection weights to describe an SLFN. 
We use l £ [0,1,2] to distinguish the different layers, where 
Layer 0 to 2 are the input layer, the hidden layer, and the output 
layer, respectively, ni stands for the number of neurons in 
Layer l. Wk, P q represents the weight of the connection between 
the p th neuron in Layer k — 1 to the q th neuron in Layer fc. 
bk, p stands for the bias of the p th neurons in Layer k. Fig. 1 
depicts an example of SLFN. In the problem dataset S with ISI 
samples, the i th sample i is composed of a pattern S, and the 
corresponding desired output Ri, where i = 1,2,..., |5'|. Thus 
we use Si tm to describe the m th element of the i th pattern, 
and r* g to denote the q th element of the i th desired outputs in 
the dataset. With a given SLFN and a pattern Si, the computed 
result Ci t q can be obtained from the following formula 

m n 0 

Ci,q = / 2 (£(™2 ,pq x/i(£ Oi ,mp X 777 , ) + bl , p )) + b 2 , q ) 

p= 1 m= 1 

(1) 

where fi and f 2 are the activation functions for hidden layer 
neurons and output neurons, respectively. For a trained SLFN, 
we can make use of the difference between Ri and C, to 
evaluate the performance of the network. 

The primary function to evaluate the performance of a 
conventional BP network is the mean squared error (MSE) 
between Ri and C,. A small MSE means that the performance 
is good and the network is well-trained. However, in order to 
concentrate more on the accuracy of the classification result, 
here we adopt a new fitness function f fitness © consisting of 
the normalized mean squared error (NMSE) / nmse © and 
the classification percentage error f per ceni ©• Their formulas 
are given as follows: 

/fitness = Ot X JnMSE T /? X f pe rcent 
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/percent = 100 X ^1 - ( 2 * 4 ) 

In (0, a and /3 are user-defined parameters to balance the 
weighting on <[3]) and © in the ultimate fitness function and 
should be set to a small real value between 0 and 1. For 
instance, implementations stressing classification correctness 
can set a = 1 and /3 = 0.7. In this paper, we also use this 
setting in the simulation. 

For CROANN, we divide the samples in a dataset into 
a training set, a validation set, and a testing set. We first 
determine the best w 2 ^ pqi &i lP , b 2 , q } quadlet with the 

training set according to (O). Then we use the validation set to 
detect and avoid overfitness (See Section IV.D). Since ANNs 
should have the ability to process unfamiliar data, we use 
the testing set to evaluate the accuracy of classification and 
compare the results with other algorithms. 


IV. Algorithm Design 

In this section, we first discuss how CRO works. Then 
we introduce the encoding scheme and operators employed 
to train the structure and weights of networks for CROANN. 
Finally the stopping criteria is given. 


A. Chemical Reaction Optimization 

CRO is a population-based metaheuristic inspired by chem¬ 
ical reactions, mimicking the process of reactions where 
molecules collide with the walls of the container and with 
each other. In the process, molecules attempt to reach the 
stable state. Imagine we put a certain number of molecules 
in a closed container. At the start of reaction, molecules with 
excess energy are unstable. Since there is a natural tendency 
for a reacting system to stay in a stable state, molecules 
change their energy state from high to low through a sequence 
of elementary reactions. When the reaction stops, we can 
get molecules with the minimum stable state of energy. If 
we consider different energy states as an energy surface, the 
transition and interaction of molecules can result in a gradual 
rolling down process on PES and the lowest point is the 
minimum stable state of energy. We call the initial molecules 
“reactants” and the final “products”. 

In CRO, each molecule has a molecular structure, represent¬ 
ing a solution of the problem, and two kinds of energy, i.e. 
potential energy (PE) and kinetic energy (KE). PE stands for 
the fitness function value and KE describes the tolerance of a 
molecule to an increase of its energy state. Suppose u and / 
are a molecular structure and the fitness function, respectively, 
then we compute its PE with PE U = / (uj). If u>' is the 
new structure derived from to in an elementary reaction, then 
PEjj + KEu > PE^' has to be satisfied. Otherwise, the 
reaction is invalid and the new structure should be rejected. In 
other words, KE represents the ability for a molecule to escape 
from a local minimum. This rule can also be easily applied 
to intermolecular elementary reactions and changes may be 


more vigorous since more KE can be transformed into PE. 
A central energy buffer is set up for energy conservation and 
convergence. 

We define four types of elementary reactions for CRO, 
namely, on-wall ineffective collision, decomposition, inter¬ 
molecular ineffective collision, and synthesis. These four el¬ 
ementary reactions are defined to cover all possible reactions 
under the framework of CRO. These four types can be 
classified into two classes: uni-molecular reactions include 
the first two types and inter-molecular reactions include the 
latter two. A uni-molecular reaction can be triggered when 
a single molecule collides on a wall of the container. An 
inter-molecular reaction happens when two or more molecules 
collide with each other (for simplicity, only two molecules are 
considered in this class of reactions). Interested readers can 
refer to |8] for the pseudocode of CRO. 

B. Encoding 

To accelerate the simulation and to reduce the difficulty 
in programming, we use two matrices and two vectors to 
represent different weights and thresholds. This scheme is 
similar to that described in g) with one key difference. In 
S), there is an extra element in each solution which controls 
the perturbation strength, but we abandon it since CRO uses 
one constant parameter to control the Gaussian perturbation. 
We call the complete collection of these matrices and vectors 
a “solution structure”. Every molecule has a solution structure 
representing the network structure and determining the current 
energy state the molecule is at. 

C. Operators 

1) Initial Solution Generator: This operator is designed 
to generate a new structure of networks randomly. Each 
call will generate a new solution structure. It is achieved 
by first assigning random numbers to all elements and then 
scaling them to [-1.0, 1.0] linearly. Its pseudocode is given in 
Algorithm 1 below: 

Algorithm 1 InitialGen (w) 

1 : for all Matrices and vectors m in u do 
2: for all Elements e in m do 

3: Randomly generate a real number n 

4: e = n 

5: end for 

6: for all Elements e in m do 

7: e = 2 * (e — min(m))/(max(m) — min(m)) — 1 

8 : end for 

9: end for 


2 ) Neighbor Generator: This operator is designed to gener¬ 

ate a new solution u> from a given solution ui. Its main purpose 

is to perform a local search for better solutions. It is done 
by perturbing one random element in the matrices or vectors 
in uj with Gaussian perturbation, whose mean is the original 
number and variation is a user-defined value. Its pseudocode is 
given in Algorithm 2 where p stands for Gaussian perturbation 
function and v is a user-defined variance. 







Algorithm 2 NEIGHBOUR (w) 

1: Generate a random integer i smaller than the total number 
of elements in a solution 
2: Find the i th element e in lu 

3: e = e + p(e, v ) 


3) Decomposition: This operator is used to generate two 
different solutions ut-y and uj 2 based on a given solution lu. 
This operator can help molecules jump out of local minimums 
by performing severe perturbation on the solution. It is done 
by perturbing every element in the matrices and vectors in 
uj with Gaussian perturbation probabilistically, say 50%. If, 
though unlikely to happen, nothing is changed during the first 
stage of the perturbation, this solution will be perturbed by 
the Neighbor Generator function. Its pseudocode is given in 
Algorithm 3 and the variables are as defined in the previous 
section. 


Algorithm 3 Decomposition (lu) 

1: change = false 
2: Copy ui to to 1 and tv 2 

3: for all Matrices and vectors m in cc 1 and cu 2 do 

4: for all Elements e in m do 

5: Generate a real r between 0 and 1 

6: if r > 0.5 then 

7: e = e + p(e, v ) 

8: change = true 

9: end if 

10 : end for 

11: if change ^ true then neighbour (the original lu x or 

<* 4 ) 

12: end if 

13: end for 


4) Synthesis: This operator is used to generate a new 
solution lu based on two given solutions lu\ and lu 2 - It is done 
by randomly choosing elements from both solutions with equal 
possibilities to form a new solution. Its pseudocode is given 
in Algorithm 4. 


Algorithm 4 SYNTHESIS (lui, a>i) 

1: for all Matrices and vectors m in lu do 
2: for all Elements e in m do 

3: Generate a real r between 0 and 1 

4: if r > 0.5 then 

5: e =counterpart in m\ 

6 : else 

7: e =counterpart in m 2 

8 : end if 

9: end for 

10: end for 


D. Stopping Criteria 

We introduce two stopping criteria to CROANN: maxi¬ 
mum function evaluations (FE) and overfitness detection. The 
maximum FE criterion is a hard limit of CROANN and 
no simulation could evaluate the fitness function more than 
this threshold. The design of the other stopping criterion, 
overfitness detection, is based on the observation that good per¬ 
formance with the training samples may not necessarily result 
in a good performance with the testing samples. Poor overall 
performance might be obtained due to over-training the system 
in the training phase. To address this problem, we employ a 
“sliding window” to monitor the presence of overfitness in 
the network. CROANN measures the validation performance 
of the current best network ValFitnesSi and compares this 
performance with the previous best validation performance 
using ValBesti-i = miniValFitnessj^j < i ) at the 
end of i th window. If the previous validation performance 
is better, then we say this network is “overfitting” and add 
1 to the overfitness counter. When this overfitness counter 
reaches a user-defined threshold, CROANN will terminate. 
However, once ValBesti is smaller than ValFitnesSi_i, the 
overfitness counter is reset to zero. The pseudocode describing 
the stopping criteria is given in Algorithm 5 below: 


Algorithm 5 Stopping Criteria ( ValBest , CurrentNet) 
1: if Current FE exceeds maximum FE then 
2: CROANN stop 

3: end if 

4: Calculate the validation performance ValFitness 

5: if ValFitness < ValBest then 

6: over FitCount = 0 

7: ValBest = ValFitness 

8: Store CurrentN et 

9: else 

10: over FitC ount + + 

11: if overFitCount > overFitThres then 

12: Restore the saved best network 

13: CROANN stop 

14: end if 

15: end if 


V. Simulation Results 

In order to evaluate the performance of CROANN, the 
simulation is implemented with C++ and tested with some 
famous classification datasets from the UCI repository |20l : 
Iris classification dataset, Wisconsin breast cancer classifi¬ 
cation dataset and Pima Indians diabetes dataset. They are 
all derived from real-world problems. The Iris dataset is a 
standard benchmark for evaluating the performance of ANNs 
and has been tested by many neural network algorithms, 
including some EA-based ANNs algorithm. The latter two 
datasets are used to test the ability of CROANN to recover 
from polluted data 12. The datasets are partitioned based 
on the suggestion given by Prechelt ED. Each of them is 
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Fig. 2. Impact of CRO Parameters on Average Error Rate of Classification 


separated into three classes: training class, validation class and 
testing class with the ratio of 2:1:1 using the simple random 
sampling method. First, several tests are conducted to evaluate 
the impact of changes in different CRO parameters, using the 
Iris classification dataset as benchmark. Then CROANN is 
compared with other EA-based ANN algorithms proposed in 
the recent literature to evaluate the performance. 

A. Analysis of the impact of CROANN parameters 

The ratio of occurrence for different elementary reactions 
of CRO and the initial size of energy buffer have direct 
impact on the final performance of the neural network. So 
it is essential to analyze their impact in order to adjust them 
for later use. The first experiment includes a set of tests on 
different CRO parameter values, based on the Iris dataset, a 
benchmark test for machine learning and pattern recognition. 
When investigating one parameter, other parameters are set 
constant. Results of each test are generated by averaging the 
error rate of testing set in 50 trials. The analysis results are 
shown in Fig. 2. 

Results show that a Gaussian perturbation variance which 
is too large makes CRO scan through the solution space in 
a relatively large scale but can not explore small regions 
carefully while a variance which is too small is likely to 
result in getting stuck in local optima. Similarly, a small 
population can not let the molecules fully explore the solution 
space, while a large population will reduce the possibility 


TABLE I 

CROANN Parameters 


Parameter 

Value 

Iris 

Cancer 

Diabetes 

Function Evaluation Limit 

50 000 

50 000 

172 800 

Max Window Count 

300 

300 

500 

Gaussian Perturbation Variance 


0.1 


Initial Population Size 


20 


Initial Energy Buffer Size 


0.0 


Initial Molecular Kinetic Energy 


100.0 


Molecular Collision Rate 


0.1 


Kinetic Energy Loss Rate 


0.1 


Decomposition Threshold 


300 


Synthesis Threshold 


500 


Number of Trials 


50 


Window Size 


100 



of reaction occurrence of individual molecules. Initial energy 
buffer size, initial molecular kinetic energy, and kinetic energy 
loss rate control the overall energy in the whole system. They 
also cooperate with molecular collision rate, thresholds for 
decomposition and synthesis to control the ratio of occurrence 
for different elementary reactions of CRO to a proper value. 

B. Comparing CROANN with other EA-based ANN algorithms 
For comparison, we employ six EA-based training algo¬ 
rithms, namely simple genetic algorithm (SGA) ANNs |22) . 































































































TABLE II 

Error Rate (%) of CROANN and Other ANNs of the Iris Dataset 


Algorithm 

Training Set 

Validation Set 

Testing Set 

Mean 

Std. 

Min 

Max 

Mean 

Std. 

Min 

Max 

Mean 

Std. 

Min 

Max 

CROANN 

2.00 

3.68 

0.00 

5.33 

4.32 

2.16 

2.70 

8.10 

1.31 

1.77 

0.00 

7.89 

SGAANN 

16.24 

5.92 

7.21 

30.23 

- 

- 

- 

- 

14.20 

8.82 

0.00 

36.00 

EPANN 

18.54 

6.47 

7.69 

29.77 

- 

- 

- 

- 

12.56 

8.42 

0.00 

32.00 

ESANN 

14.47 

5.25 

6.97 

27.43 

- 

- 

- 

- 

7.08 

6.40 

0.00 

26.00 

PSOANN 

13.27 

5.39 

7.38 

25.84 

- 

- 

- 

- 

10.38 

9.36 

0.00 

32.00 

GSOANN 

12.03 

1.60 

8.63 

15.36 

- 

- 

- 

- 

3.52 

2.27 

0.00 

8.00 

MGNN 

- 

- 

- 

- 

- 

- 

- 

- 

4.68 

- 

- 

- 


TABLE III 

Comparison between CROANN and Other Machine Learning Algorithms of the Iris Dataset 


Algorithm 

CROANN GANet-best 127] 

SVM-best [28] 

CCSS 129] 

Error Rate 

1.31 6.40 

1.40 

4.40 


TABLE IV 

Error Rate (%) of CROANN and Other ANNs of the Wisconsin Breast-Cancer Dataset 


Algorithm 

Training Set 

Validation Set 

Testing Set 

Mean 

Std. 

Min 

Max 

Mean 

Std. 

Min 

Max 

Mean 

Std. 

Min 

Max 

CROANN 

3.89 

0.72 

3.21 

5.61 

3.54 

0.42 

2.86 

4.00 

1.06 

0.67 

0.00 

2.29 

SGAANN 

3.88 

0.63 

3.04 

5.63 

3.86 

1.14 

2.59 

7.82 

1.50 

0.72 

0.00 

2.85 

EPANN 

3.58 

0.63 

3.03 

6.18 

3.30 

1.45 

1.85 

8.99 

1.54 

1.16 

0.57 

6.29 

ESANN 

2.98 

0.11 

2.73 

3.16 

2.70 

0.39 

2.14 

3.52 

0.95 

0.66 

0.00 

2.86 

PSOANN 

3.26 

0.24 

2.92 

3.80 

2.37 

0.43 

1.37 

3.35 

1.24 

2.02 

0.00 

11.43 

GSOANN 

3.35 

0.09 

3.26 

3.56 

2.17 

0.21 

1.93 

2.89 

0.65 

1.42 

0.00 

1.14 

MGNN 

- 

- 

- 

- 

- 

- 

- 

- 

3.05 

- 

- 

- 


TABLE V 

Comparison between CROANN and Other Machine Learning Algorithms of the Wisconsin Breast Cancer Dataset 


Algorithm 

CROANN GANet-best 127] 

SVM-best [28] 

CCSS [29] 

COOP [30]] 

CNNE 118] 

EPNet [2] 

EDTs 131] 

Error Rate 

1.06 1.06 

3.10 

2.72 

1.23 

1.20 

1.38 

2.63 


evolutionary programming (EP) ANNs Ii23ll24l . evolutionary 
strategies (ES) ANNs j25l . particle swarm optimizer (PSO) 
ANN (26), mutation-based neural networks (MGNN) J4[, and 
group search optimizer (GSO) ANNs liT9l . We also compare 
CROANN with some other sophisticated or hybrid machine 
learning algorithms further to check whether CROANN can 
be competitive with them. Since there is no agreement on 
the maximum number of FEs in the previous literature, the 
maximum number of FEs for the first two datasets is set to 50 
000 according to the average maximum FEs given in a while 
for the third dataset, it is set to 172 800 according to Gil- 
Other CRO parameters are determined based on the analysis in 
the previous section. Table I gives the collection of CROANN 
parameters in the simulation. 

1) Iris Dataset: The Iris dataset is the most widely-used 
benchmark for machine learning and pattern recognition. The 
whole dataset can be divided into three different classes of 
iris species: Setosa, Versicolour and Verginica. The species of 
iris can be determined by four attributes of the plants: sepal 
length, sepal width, petal length and petal width. The dataset 


is divided into three parts: 75 training samples, 37 validation 
samples and 38 testing samples. 

Results generated by CROANN, averaged over 50 trials, 
and those of six other ANNs are displayed in Table II. It is 
easy to see that CROANN outperforms all other EA-based 
ANNs dramatically, in both training error and testing error. 
The comparison with recent machine learning algorithms in 
the literature is listed in Table III. CROANN also generates 
the best result among these algorithms. 

2 ) Wisconsin Breast Cancer Dataset: The Wisconsin Breast 
Cancer dataset contains 699 samples, each of which has 
real-valued attributes and can be classified into two classes: 
458 benign and 241 malignant. To test the performance of 
CROANN, all samples are divided into three parts by simple 
random sampling method: 349 training samples, 175 validation 
samples and 175 testing samples. 

Results from CROANN and six other ANNs are listed in 
Table IV. CROANN performs superior to SGAANN, EPANN, 
PSOANN, GSOANN, and MGNN, and it has a similar 
performance with ESANN. As compared with GSOANN, 




































TABLE VI 

Error Rate (%) of CROANN and Other ANNs of the Pima Indian Diabetes Dataset 


Algorithm 

Training Set 

Validation Set 

Testing Set 

Mean 

Std. 

Min 

Max 

Mean 

Std. 

Min 

Max 

Mean 

Std. 

Min 

Max 

CROANN 

16.55 

2.73 

15.89 

18.23 

16.04 

3.01 

14.58 

17.71 

19.67 

5.38 

17.19 

23.44 

SGAANN 

17.73 

0.96 

16.05 

20.67 

16.48 

1.23 

14.61 

19.44 

24.46 

3.75 

20.31 

35.94 

EPANN 

18.38 

1.56 

16.28 

21.34 

17.18 

1.87 

14.75 

20.55 

25.75 

4.89 

18.23 

36.46 

ESANN 

15.85 

0.28 

15.32 

16.37 

14.26 

0.35 

13.34 

16.51 

20.93 

1.76 

17.19 

25.52 

PSOANN 

16.25 

0.19 

15.76 

16.77 

14.74 

0.47 

14.20 

16.51 

20.99 

1.47 

18.23 

23.96 

GSOANN 

16.43 

0.21 

15.97 

16.80 

14.82 

0.21 

14.37 

15.21 

19.79 

0.96 

17.19 

21.88 


TABLE VII 

Comparison between CROANN and Other Machine Learning Algorithms of the Pima Indian Diabetes Dataset 


Algorithm 

CROANN 

GANet-best [27J 

SVM-best (28J 

CCSS 1 29 

COOP (30J| 

CNNE 118J 

EPNet I12J 

EENCL [32J 

Error Rate 

19.67 

24.70 

22.70 

24.02 

19.69 

19.60 

22.38 

22.1 


CROANN can give a better standard derivation. There are 
also comparisons with recent published results listed in Table 
V. We also compare CROANN with other machine learning 
algorithms and the results are given in Table V. We can see that 
CROANN performs very well and shares the best performance 
with GANet-best. 

3) Pima Indian Diabetes Dataset: The Pima Indian Dia¬ 
betes dataset contains 768 samples, 500 of which are indicated 
with sign of diabetes and 268 are without such sign. There 
are eight real-valued attributes that can be used to determine 
whether a patient has the sign of diabetes or not. This dataset 
is known as a difficult problem for machine learning for its 
scarcity of samples and heavy noise pollution. This dataset is 
partitioned into 384 training samples, 192 validation samples 
and the 192 testing samples. 

Table VI shows the comparison between CROANN and 
five other EA-based ANNs (MGNN is not included because 
there is no simulation data provided for this dataset in [4j). 
CROANN again outperforms the rest in terms of the testing 
set mean error. Results from other machine learning algorithms 
are tabulated in Table VII, which demonstrates that CROANN 
achieves a performance that is comparable with the best, i.e. 
CNNE, and greatly outperforms the others. 

VI. Conclusion 

In this paper, we propose a novel EA-based ANNs called 
CROANN, to train ANNs, based on CRO. We have shown 
that CROANN can optimize the structure as well as the 
weights of ANNs simultaneously using stochastic processes. 
In CROANN, the structure and weights of a network is 
encapsulated in one solution, which is considered as a point 
in the solution space. In this way, CROANN searches the 
global minimum, which represents the network configuration 
providing the best performance. Since there are no restrictions 
on the evolution of the network structure and on the weight 
adaptation, CROANN does not suffer from the ’’structural hill¬ 
climbing” problem observed in the constructive and prun¬ 
ing approaches of ANN a. Simulation results show that 
CROANN can outperform other existing EA-based ANN 


algorithms. In the Iris dataset and the Pima Indian Diabetes 
dataset, CROANN provides the best testing error rate among 
all representative EA-based ANNs. Although CROANN is 
not the best in the test with the Wisconsin Breast Cancer 
dataset, the gap between the result generated by CROANN 
and the best one is indeed very small. In the comparisons 
with other sophisticated machine learning algorithms on the 
three classification problems, CROANN can always provide 
the best performance. To summarize, CRO is well suited to 
be incorporated in ANN to solve classification problems. In the 
future, we will conduct a systematic analysis of variance on the 
parameters and perform Student’s t-test to show significance 
of the results. Moreover, we can further explore the ability of 
CROANN to solve some real-world classification problems, 
and other types of problems including function approximation, 
data processing, and robotics. 
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